Dataset Filtering Techniques in Constraint-Based Frequent Pattern Mining
نویسندگان
چکیده
Many data mining techniques consist in discovering patterns frequently occurring in the source dataset. Typically, the goal is to discover all the patterns whose frequency in the dataset exceeds a userspecified threshold. However, very often users want to restrict the set of patterns to be discovered by adding extra constraints on the structure of patterns. Data mining systems should be able to exploit such constraints to speed-up the mining process. In this paper, we focus on improving the efficiency of constraint-based frequent pattern mining by using dataset filtering techniques. Dataset filtering conceptually transforms a given data mining task into an equivalent one operating on a smaller dataset. We present transformation rules for various classes of patterns: itemsets, association rules, and sequential patterns, and discuss implementation issues regarding integration of dataset filtering with well-known pattern discovery algorithms.
منابع مشابه
Efficient Constraint-Based Sequential Pattern Mining Using Dataset Filtering Techniques
Basic formulation of the sequential pattern discovery problem assumes that the only constraint to be satisfied by discovered patterns is the minimum support threshold. However, very often users want to restrict the set of patterns to be discovered by adding extra constraints on the structure of patterns. Data mining systems should be able to exploit such constraints to speed-up the mining proce...
متن کاملOn Efficiency of Dataset Filtering Implementations in Constraint-Based Discovery of Frequent Itemsets
Discovery of frequent itemsets is one of the fundamental data mining problems. Typically, the goal is to discover all the itemsets whose support in the source dataset exceeds a user-specified threshold. However, very often users want to restrict the set of frequent itemsets to be discovered by adding extra constraints on size and contents of the itemsets. Many constraint-based frequent itemset ...
متن کاملUsing a Data Mining Tool and FP-Growth Algorithm Application for Extraction of the Rules in two Different Dataset (TECHNICAL NOTE)
In this paper, we want to improve association rules in order to be used in recommenders. Recommender systems present a method to create the personalized offers. One of the most important types of recommender systems is the collaborative filtering that deals with data mining in user information and offering them the appropriate item. Among the data mining methods, finding frequent item sets and ...
متن کاملPreference-Based Frequent Pattern Mining
Frequent pattern mining is an important data mining problem with broad applications. Although there are many in-depth studies on efficient frequent pattern mining algorithms and constraint pushing techniques, the effectiveness of frequent pattern mining remains a serious concern: it is non-trivial and often tricky to specify appropriate support thresholds and proper constraints. In this paper, ...
متن کاملConstraint-based Tree Pattern Mining
Most work on pattern mining focus on simple data structures like itemsets or sequences of itemsets. However, a lot of recent applications dealing with complex data like chemical compounds, protein structure, XML and Web Log databases, social network, require much more sophisticated data structures (trees or graphs) for their specification. Here, interesting patterns involve not only frequent ob...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2002